Adaptive Feature Extraction Method for Degraded Character Recognition
نویسندگان
چکیده
Most character recognition applications target machine printed and handwritten characters on paper documents. Recently, the recognition of text in videos, web documents, and natural scenes has become an urgent demand; research has intensified because this task is difficult to realize (Antonacopoulos & Hu, 2004; Doermann et al., 2003; Kise & Doermann, 2007; Lienhart & Wernicke, 2002; Lyu et al., 2005; Zhang & Kasturi, 2008). The problems posed by recognizing low quality characters in the above mentioned applications are mainly due to deformation such as the variety of font styles and style effects, as well as image degradation like background noise, blur, and low resolution. A key weakness of most conventional character recognition methods is that they tackle either one problem or the other, not both. For overcoming image degradation, some methods, e.g. (Ho, 1998; Kopec, 1997; Xu & Nagy, 1999), design templates that reflect the degradation type anticipated. Also a robust discriminant function for recognizing degraded characters was proposed in (Sato, 2000; Sawaki & Hagita, 1998). Unfortunately, these methods are sensitive to shape deformation, since they employ image-based template matching. They fail to effectively handle multiple fonts and several style effects. On the other hand, geometric features are often used for recognizing multiple fonts. Stroke direction is particularly effective against character deformation (Umeda, 1996). For example, the direction contribution based on stroke run-length is effective (Akiyama & Hagita, 1990; Srihari et al., 1997; Zhu et al., 1997). However, geometric features are not robust against corruption of information due to image degradation. In addition, although geometric features are more robust against deformation than image-based template matching, they are not invariant for deformation such as aspect ratio fluctuation and stroke position shift. Therefore, geometric features are weak against the kinds of deformation that are not present in the training samples. For overcoming deformation problems mentioned above, nonlinear shape normalized techniques (Tsukumo& Tanaka, 1988; Yamada et al., 1990) have been proposed as a pre-processing method to relocate strokes uniformly. They normalize a pattern by exploiting the distance between strokes (Tsukumo & Tanaka, 1988) and stroke line density (Yamada et al., 1990), and are mainly aimed at the recognition of Kanji characters that consist of many strokes in mostly square patterns. Therefore, applying these methods to the recognition of numerals, alphabets and kana characters, which consist of fewer strokes and are not square shape, is difficult. Also these methods are ineffective for degraded characters with backgrounds noise and blur be3
منابع مشابه
Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملGrayscale Feature Combination in Recognition based Segmentation for Degraded Text String Recognition
Grayscale feature is very effective for degraded character recognition. While many papers focus on different feature extraction algorithms on single character recognition, few deals with the impact of the selected feature on segmentation. For recognition-based segmentation, a good recognition performance on single character may not always have good performance on segmentation. In this paper, tw...
متن کاملRobust Feature Extraction Based on Run-Length Compensation for Degraded Handwritten Character Recognition
Conventional features are robust for recognizing either deformed or degraded characters. This paper proposes a feature extraction method that is robust for both of them. Run-length compensation is introduced for extracting approximate directional run-lengths of strokes from degraded handwritten characters. This technique is applied to the conventional feature vector based on directional runleng...
متن کاملGeneralization of Hindi OCR Using Adaptive Segmentation and Font Files
In this chapter, we describe an adaptive Indic OCR system implemented as part of a rapidly retargetable language tool effort and extend work found in [20, 2]. The system includes script identification, character segmentation, training sample creation, and character recognition. For script identification, Hindi words are identified in bilingual or multilingual document images using features of t...
متن کاملStructural Run Based Feature Vector to Classify Printed Tamil Characters Using Neural Network
Feature Extraction plays most crucial and important role in character recognition. The selection of stable and representative set of features is the main problem in pattern recognition. Because of font characteristics and style variation of machine printed Tamil characters, feature extraction remains a problem. Feature extraction involves reducing the amount of resources required to describe a ...
متن کاملPrototype Extraction and Adaptive OCR
ÐTo maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a ...
متن کامل